4 resultados para Classification and Regression Trees

em DigitalCommons@University of Nebraska - Lincoln


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Where the creation, understanding, and assessment of software testing and regression testing techniques are concerned, controlled experimentation is an indispensable research methodology. Obtaining the infrastructure necessary to support such experimentation, however, is difficult and expensive. As a result, progress in experimentation with testing techniques has been slow, and empirical data on the costs and effectiveness of techniques remains relatively scarce. To help address this problem, we have been designing and constructing infrastructure to support controlled experimentation with testing and regression testing techniques. This paper reports on the challenges faced by researchers experimenting with testing techniques, including those that inform the design of our infrastructure. The paper then describes the infrastructure that we are creating in response to these challenges, and that we are now making available to other researchers, and discusses the impact that this infrastructure has and can be expected to have.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Hundreds of Terabytes of CMS (Compact Muon Solenoid) data are being accumulated for storage day by day at the University of Nebraska-Lincoln, which is one of the eight US CMS Tier-2 sites. Managing this data includes retaining useful CMS data sets and clearing storage space for newly arriving data by deleting less useful data sets. This is an important task that is currently being done manually and it requires a large amount of time. The overall objective of this study was to develop a methodology to help identify the data sets to be deleted when there is a requirement for storage space. CMS data is stored using HDFS (Hadoop Distributed File System). HDFS logs give information regarding file access operations. Hadoop MapReduce was used to feed information in these logs to Support Vector Machines (SVMs), a machine learning algorithm applicable to classification and regression which is used in this Thesis to develop a classifier. Time elapsed in data set classification by this method is dependent on the size of the input HDFS log file since the algorithmic complexities of Hadoop MapReduce algorithms here are O(n). The SVM methodology produces a list of data sets for deletion along with their respective sizes. This methodology was also compared with a heuristic called Retention Cost which was calculated using size of the data set and the time since its last access to help decide how useful a data set is. Accuracies of both were compared by calculating the percentage of data sets predicted for deletion which were accessed at a later instance of time. Our methodology using SVMs proved to be more accurate than using the Retention Cost heuristic. This methodology could be used to solve similar problems involving other large data sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This register lists the largest trees of over 80 species identified in Nebraska. The name of the owner and nominator, size and location of each tree follow each listing. Many people across Nebraska have worked hard to make this register as comprehensive and accurate as possible, but the quest to find the largest trees in Nebraska is never over. Champion trees are by nature old, and old trees diminish and die. Larger trees are newly discovered. Thus, this list continually changes as new nominations are submitted.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study compares information-seeking behavior of Bachelor of Science and Master of Science students in the fields of agricultural extension and education. The authors surveyed Iranian students in departments of agricultural extension and education at four universities in Tehran, Shiraz, Mollasani, and Kermanshah. This study focused on three aspects: (1) comparison of amounts of information-seeking behavior between Bachelor of Science and Master of Science agricultural extension and education students; (2) comparison of information-seeking behavior varieties in Bachelor of Science and Master of Science agricultural extension and education students; (3) Comparison of amounts of available information resources at four universities and its effectiveness on students' information-seeking behavior; and (4) comparison of research and educational outputs in Bachelor of Science and Master of Science students. Scale free technique, division by mean method, principal components analysis technique, Delphi method, t-test, correlation and regression tools were used for data analysis. This study revealed that Bachelor of Science students' information-seeking behavior is for improving educational output, but Master of Science students' information-seeking behavior is for promoting research output. Among varieties of Internet searching skills, library searching skills, and awareness of library information-seeking methods with students' information-seeking behavior, there are not significant differences between two groups of students.